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Abstract 

In this paper, we consider the control problem with the Average-Value-at-Risk 
(AVaR) criteria of the possibly unbounded L^-costs in infinite horizon on a Markov 
Decision Process (MDP). With a suitable state aggregation and by choosing a priori 
a global variable s heuristically, we show that there exist optimal policies for the 
infinite horizon problem. 
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1 Introduction 

In classical models, the optimization problem has been solved by expected performance 
criteria. Beginning with Bellman [B], risk neutral performance evaluation has been used 
via dynamic programming techniques. This methodology has seen huge development 
both in theory and practice since then. However, in practice expected values are not 
appropriate to measure the performance criteria. Due to that, risk aversive approaches 
have been begun to forecast the corresponding problem and its outcomes specihcally 
by utility functions (see e.g. mm)- To put risk-averse preferences into an axiomatic 
framework, with the seminal paper of Artzner et ah [2], the risk assessment gained new 
aspects for random outcomes. In j2], the concept of risk measure has been dehned and 
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theoretical framework has been established. We will use this framework to measure risk 
aversion. We replace the risk neutral expectation operator with this risk averse operator 
and study the optimal control of inhnite sum of cost functions and characterize the optimal 
policy stationary as in risk neutral case but in a state-aggregated setting. 

The rest of the paper is as follows. In Section 2, we give the preliminary theoretical 
framework. In Section 3, we derive the dynamic programming equations for MDP using 
AVaR criteria for the inhnite time horizon and conclude the paper by giving an application 
of our results to classical LQ problem to illustrate our results. 

1.1 Controlled Markov Processes 

We take the control model Ai = {Aimn G No}, where for each n G No, 

Ain ■■= {Xn, An, C„) (1.1) 

with the following components: 

• Xn and An denote the state and action (or control) spaces, where Xn take values in 
a Borel set X whereas An take values in a Borel set A. 

• For each x G let A„(x) C An be the set of all admissible controls in the state 
Xn = X. Then 

:= {(x, a) : X G Xn, a G A„(x)}, (1.2) 

stands for the set of feasible state-action pairs at time n, where we assume that 
is a Borel subset of Xn x 

• We let Xn+i = Fn{xn, ttn, Cn), for all n = 0,1,... with x„ G Xn and G An as 
described above, with independent random disturbances G Sn having probability 
distributions /i„, where the Sn are Borel spaces. 

• Cn(x,a) : ]K„ —)■ M stands for the deterministic cost function at stage n G No with 
(x,a) G ]K„. 

The random variables {(^n}n>o ai’e dehned on a common probability space (12, X, {Xn}n>o, P), 
where P is the reference probability space with each measurable with respect to sigma 
algebra Xn with X = a^U'^^^Xn)- Based on the action a G K„(x) chosen at time n, we 
assume that An is Xn = cr(A'o, Aq, ..., A'„)-measurable, i.e. our decision might depend 
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entirely on the history hn, where hn = {xo,aQ,xi, a„_i, a;„) G Hn is the history np to 
time n, where dehne recnrsively 

Ho ;= X, ■.= HnXAxX (1.3) 

For each n G No, let F„ be the family of measnrable fnnctions /„ : Hn —)■ An snch that 

fn{x) G An{x), (1.4) 

for all X G Xn- A seqnence tt = {/„} of fnnctions fn G F„ for all n G No is called a policy. 
We denote by fl the set of all the policies. Then for each policy tt G If and initial state 
X G X, a stochastic process {{xn,an)} and a probability measnre PJ is dehned on (fl,-F) 
in a canonical way, where x„ and represent the state and the control at time n G Nq. 
The expectation operator with respect to PJ is denoted by EJ.The distribntion of Xn+i 
is given by the transition kernel Q from X x A to A as follows: 

P^{Hn+l ^ -Bx|Xo, 5'o(Xo), ..., Xn, fn{Xo, Aq, ..., X„)) 

= P^{Xn+l G B,\Xn,fn{Xo,Ao,...,Xn)) 

= Q(-Ba;|X„, gn{Xo, Aq, ..., Xn)) 

for Borel measnrable sets C X. A Markov policy is of the form 

P^{Xn+l G B,\Xn,fn{Xo,Ao,...,Xn)) = Q{B,\Xn, fn{Xn)) ■ (1.5) 

That is to say, the Markov policy tt = {fn}n>o depends only on cnrrent state Xn- We 
denote the set of all Markovian policies as fl^. Similarly, the stationary policy is of the 
form TT = {f}n>i with 

P^{Xn+T e B,\Xn,fn{Xo,Ao,...,Xn)) = Q(5, |X„,/(X„)), (1.6) 

i.e. we apply the same rule for each time episode n. Suppose, we are given a policy a = 
{fn}'^=o, then by lonescu Tulcea theorem [7], there exists a unique probability measure 
P®" on (n, P), which ensures the consistency of the inhnite horizon problem considered. 
Hence, for every measurable set B C Pn and all hn G Hn, n G Nq, we denote 

F"(xi G B) =P{B) 

P'"(X„+1 G B\hn) =Q{B\Xn,TTn{hn)) 


We consider the following cost function 


OO 


coo . 


^ ^ {Xn,On) , 


n=0 


(1.7) 
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for the infinite planning horizon and 

N 

= '^Cn{Xn,an) ( 1 . 8 ) 

n=0 

for the hnite planning horizon for some terminal time N G Nq. We take that the cost 
fnnctions {cn{xn,an)}n>o are non-negative and and C°° belong to space Pq). 

We start from the following two well-stndied optimization problems for controlled Markov 
processes. The hrst one is called finite horizon expected value problem, where we want to 
hnd a policy tt = {gn}n=o with the minimization of the expected cost: 

N 

minE^[^Cn(x„,an)] 

ttGII * ^ 

n=0 

where = 7r„(xo, Xi ,..., Xn) and Cn{xn, an) is measurable for each n = 0,A^. The second 
problem is the inhnite horizon expected value problem. The objective is to hnd a policy 
TT = {gn}'^=o with the minimization of the expected cost: 

OO 

min Cn{xn, an)] 

ttGII ^ 

n=0 

Under some assumptions the hrst optimization problem has solution in form of Markov 
policies, whereas in inhnite case the policy is stationary. In both cases, the optimal policies 
can be found by solving corresponding dynamic programming equations. Our goal is to 
study the inhnite horizon problem, where we use a risk-averse operator p instead of the 
expectation operator and look for stationary optimal policy under some conditions. 

1.2 Coherent risk measures on 

We introduce the corresponding risk averse operators that we will be working on through¬ 
out the rest of the paper. 

Definition 1.1. A function p : —)■ M zs said to be a coherent risk measure if it satisfies 

the following axioms 


convexity-1 


p{\X + (1 - A)y) < Xp{X) + (1 - A)p(y) VA G (O, l), X,YeLP ; 
If X <Y P-a.s. then p{X) < p{Y), VW, U G Lp 
p{c + X)=c + p{X), Vc G M, a: G LP; 





• p(/5X) = l3p{X), VX eLP, I3> 0. 

The particular risk averse operator that we will be working with is the AVaRQ(X) 
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Definition 1.2. Let X G P) be a real-valued random variable and let 

a G (0,1). 

• We define the Value-at-Risk of X at level a, VaRa(X), by 

VaR„(X) = inf {x G M : P(X < x) > a} (1.9) 


• We define the coherent risk measure, the Average-Value-at-Risk of X at level a, 
denoted by AVaRQ,(X) as 


AVaR„(A) 




VaRt{X)dt 


( 1 . 10 ) 


We will also need the following alternative representation for AVaRa(X) as shown in [T5] . 


.avar_repre: 


Lemma 1.1. Let X G P) be a real-valued random variable and let a G (0,1). 

Then it holds that 


AVaR„(A) 


min 

sGK 




( 1 . 11 ) 


avar_repres€ 


where the minimum is attained at s = VaRQ,(X). 


Remark 1.2. We note from the representation above that the AVaRQ,(X) is real-valued 
for any X e L^(fi,X,P). 


1.3 Time Consistency 

Definition 1.3. Let LP{Xn) be the vector space of all real-valued, Xn-measurable random 
variables on the space (f2,X, X„,P) defined above. A one-step coherent dynamic risk 
measure on L°(Xn+i) is a sequence of mappings such that 

Pt '■ LP[X ^+i) —?■ L°(X„), n = 0,..., N — 1. (1-12) 

that satify the followings 


convexity-l 


Pn{XX + (1 - A)F) < Xpn{X) + (1 - X)pn{Y) VA G (0, 1), Z,W e L°(X„+i) ; 
IfX<Y P-a.s. then Pn+i(X) < Pn+i(Y), VX,Y e L°(Xn+i) 













Remarks1 


assumptions 


6 


• pn{c + X)=C + Pn{X), Vc e L\Xn), X E L\Xn+l) i 
. PnWX) = /3pn{X), VX e L^iXn), /3 > 0 

Definition 1.4. A dynamic risk measure {pn)n=o L^{^n) is called time-consistent if 
for all X,Y E L^{Xn) and n = 0,X — 1, pn+i{X) > pn+i(Y) implies Pn{X) > Pn(X)- 

Another way to define time consistency is from the point of view of optimal policies 
(see also [20]). Intnitively, the seqnence of optimization problems is said to be dynamically 
consistent, if the optimal strategies obtained when solving the original problem at time 
t remain optimal for all subsequent problems. More precisely, if a policy n is optimal on 
the time interval [s,7"], then it is also optimal on the sub-interval [t,T] for every t with 
s < t < T. 

Remark 1.3. Given that the probability space is atomless, it is shown in fWf and m 
that the only law invariant coherent risk measure operators p on {Q, X, , i.e. 

p(X) = p(Y) (1.13) 

satisfying 

p(Z) = p(p\Fi(...p\F„W(Z)), (1.14) 

for all random variables Z are esssup(Z) and expectation E(Z) operators. This suggests 
that optimization problems with most of the coherent risk measures are not time consistent. 


2 Infinite Horizon Problem 


We are interested in solving the following optimization problem in the infinite horizon. 


OO 

min AVaR;( c(a:„, a„)), 

TTgll ^^ 

n=0 


(2.15) 


main_probleii 


First, we put the following assumptions on the problem. 

Assumption 2.1. For every n E No, we impose the following assumptions on the prob¬ 
lem: 


1. The cost function Cn : -E M. is nonnegative, lower semicontinuous (l.s.c.), that is 

if (xkyttk) -E {x,a), then 

lim inf c„ (x^, a^) > Cn{x, a) 

k—^oo 


(2.16) 









7 


and inf-compact on K^, i.e., for every x G Xn and r G M, the set 

{a G An{x)\cn{x,a) < r} 


(2.17) 


is compact. 

2. The function {x,a) f v(x', s — c)Q(dx'lx, a) is l.s.c. for all l.s.c. functions w > 0. 

3. The set An{x) is compact for every x G and for every n G Mq. 

4- The system function Xn+i = Fn{xn,an,Cn) is measurable as a mapping 

Fn : Xn X An X Sn ^ Xn+i, and (x, a) Fn{x, a, s) is continuous on K„ for every 
s ^ Sn- 


5. The multifunction also called point-to-set function x An{x), from X to A is upper 
semicontinuous (u.s.c.) that is, if {xn} C X and {a„} C A are sequences such that 

Xn X* , On G A{xn) forall n, and a„ —)■ a*, (2.18) 

then a* is in An{x*). 

6. There exists a policy tt G 11 such that Vo(x, n) < M for all x E Xq. 

I main problem 

To solve fIB.lhj) . we first rewrite the infinite horizon problem as follows: 


inf AVaR’((C°°|Xo = x) = inf inf < s 


= inf inf s s + 

ssM wen 

= inf I s 


\ — a 
1 

1 — a 


E'[(C“ - s) 
E;[(C“ - s) 


mfEa(C“-ii)- 


1 — CK wen 

Based on this representation, we investigate the inner optimization problem for finite time 
as in [1]. Let n = 0,1, 2, N. We define 


s) := E^[(C'^ — s)^], x E X, s G M, tt G 11, (2-19) 

Wn{x,s) := inf WNnix,s), x E X, s G M, (2.20) 

wen 

We work with the Markov Decision Model with a 2-dimensional state space X = X x M. 
The second component of the state (x„,s„) G X gives the relevant information of the 
history of the process. We take that there is no running cost and we assume that the 








terminal cost function is given by V-it,{x,s) := s) := s~. We take the decision 

rules fn'-X^A such that fn{x, s) G An{x) and denote by 11^ the set of Markov policies 
^ = (/o)/i) •••))) where /„ are decision rules. Here, by Markov policy, we mean that the 
decision at time n depends only on the current state x and as well as on the global variable 
s as to be seen in the proof below. We denote for 

V G M(X) := {u : X — )■ M+ : measurable} (2-21) 

the operators: 

Lv{x,s,a):= j v{x\ s — c)^{dx'\x,a)^ (x, s) G X, a G (2.22) 

and 

Tfv{x,s):= J v{x', s — c)Q{dx'\x, f{x, s)), (x, s) G X 
The minimal cost operator of the Markov Decision Model is given by 


Tv{x,s)= inf Lv{x,s,a). (2.23) 

aGAn{x) 

For a policy tt = (/o,/i ,/25 •••) ^ n^* We denote by = (/i,/ 2 ,-*-) the shifted policy. 
We define for tt G and n = —1, 0,1, N: 


hriH-l,7r • FfohriTTi 
Fn+i • iiif Idi-j-iTp 

TT 

= tk. 

A decision rule f* with the property that W = Tf*Vn-i is called the minimizer of W. We 
have Markovian policies H^ C H in the following sense: Given the global variable s, for 
every a = (/o, /i,...) G H^ we hnd a policy tt = {qq, ...) G H such that 

9 o{xo) := fo{xo,s) 
gi{xo, ao, xi) := /i(xi, s - cq) 


We remark here that a Markovian policy a = (/o,/i,...,) G H^ also depends on the 
history of the process but not on the whole information. The necessary information at 
time n of the history hn = (xq, uq, Xi,..., a„_i, x„) are the state Xn and the necessary 
information = sq — Cq — Ci — ... — c„_i. This dependence of the past and the optimality 
of the Markovian policy is shown in the following theorem. 
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LUgment_inf o 


Theorem 2.2. ^ For a given policy a, the only necessary information at time n of the 
history hn = (xq, Oq, a„_i, are the followings 


• the state Xn 


• the value Sn = s — cq — ci — ... — c„_i for n = 1,2,..., N. 

Moreover, it holds for n = 0,1,..., N that 

• Wna = Vna for a E . 

• Wn = Vn 

If there exist minimizers /* ofVn on all stages, then the Markov policy a* = (/q,...,/]^) 
is optimal for the problem 

mfE:|(C''-s)+] (2.24) 

ttGII 

Proof. For n = 0, we obtain 

Vo^{x,s) = Tf^V_i{x,s) 

= j - c)Q(rfx'|x,/o(x, s)) 

= j {s- c)"Q(rfx'|x, /o(x, s)) 

= J{c- s)+Q(rfx'|x, /o(x, s)) 

= E^[(Co - s)+] = Wo^{x, s) 

Next by induction argument 

' 5 ) TjqVji^(^X ^ 5 ) 

= J Vna{x’, S - c)Q(dx'|x, /o(x, s)) 

= j Ef,[(C'"-(s-c))+]Q(cix'|x,/o(^,s)) 

= yEf,[(c + C"-s)+]Q(rfx'|x,/o(x,s)) 

= E^[C”+^ - s] = Wn+la{x, s) 

We note that the history of the Markov Decision Process hn = (xq, sq, oq, xi, si, oi,x„, s„) 
contains history hn = (xq, Oq, Xi, Oi,x„). We denote by If the history dependent policies 
of the Markov Decision Process. By ([5], Theorem 2.2.3), we get 

inf Vna{x,s) = inf 145 f(x,s). 

ttGII 
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Hence, we obtain 


inf Wna > inf Wmt > inf = inf Vna = inf Wna 

o-en^f Tren o-en^f 

We conclnde the proof. □ 


Lerle_f inite 


r7i7 l as sumptions 

Theorem 2.3. ^ Under the conditions of the Assumptions \\i^. 11 there exists an optimal 
Markov policy, in the sense introduced above, cr* G H for any finite horizon N eNq with 


inf E;1(C" - s)+| = E; |(C« - s)+] (2.25) 

ttGII 

Now we are ready to state onr main resnlt. 

B imptions 

there exists an optimal Markov policy tt* for the 

I mam problem 

infinite horizon problem ili.lP]) . 

K .a ^ umptions 

H we have 

= E-KC-- s)+] 

OO 

= E'|(C”+ 

k=n-\-l 

OO 

< £:|(C“ - s)+] + e;[ Y, cJ, 

k=n+l 

<£n(C"-s)+]+M(n). (2.26) 

l as sumptions 

where M[n) —)■ 0 as n ^ oo dne to the Assnmption E.ll Taking the inhmnm over all 
TT G n we get 

Woo{x,s) <Wn +M{n) (2.27) 

Hence we get 

Wn < Wooix, S) <Wn + M(u) (2.28) 

Letting n —)■ oo, we get 


lim Wn = Woo 

n—)-oo 

lerle finite 


(2.29) 


Moreover, by Theorem ll2.3] there exists tt* = {fn}n=o ^ ^ snch that Vf^{x) = Vf^{x) 
and we also have by the assnmption that Vf^{x) is l.s.c. By the nonnegativity of the cost 
functions c„ > 0, we have that N —)■ V))*^(x) is nondecreasing and (a;) < 'K)!oo(a^) for 
all X G X. Denote 


u{x) := snp 1 /q*^(x). 

N>0 ’ 


(2.30) 
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Then u{x) being the snpremnm of l.s.c. fnnctions is l.s.c. as well. Letting N —)■ oo, we 
have u{x) < ln*^(x). Hence is l.s.c. as well. We state that the optimal policies 

K iuerle fii^t e _ 

SjancLby Theorem 4.2.3 in |13] . 


and hence conclnde the proof. 


□ 


Remark 2.5. We recall that our optimization problem is 


OO 


inf AVaR^(y^c(x„,a„)), 

ttGII ' 


(2.31) eq:finite 


which is equivalent to 



(2.32) 


n=0 


Hence, we fix the global variable a priori s as 




(2.33) 


s = 


where VaR)(°(C°°) is decided using the reference probability measure Pq. It is claimed in 


that by fixing global variable s, the resulting optimization problem would turn out to 


be over PNaR.g{C°^), where possibly under some regularity assumptions. But, it is 

not clear to us, what these regularity conditions would be for that to hold and why it should 
be necessarily case. Since for each fixed s, the inner optimization problem in Equation 

ite 

as an optimal policy 7r(s) depending on s. Hence, as in we focus on the inner 
optimization problem but by fixing the global variable s heuristically a priori VaR)(°(C'^) 
with respect to reference probability measure P and then solve the optimization problem for 
each path u conditionally with respect to filtration Tn at each time n G No namely by taking 
into account whether for that path < 0 or > 0. Hence, by denoting Sn = C"' — s, the 
optimization problem reduces to classical risk neutral optimization problem for that path 
oj whenever < 0. We treat this classical case (see ) in the subsection below. 
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3 Solving the case when the global variable Sri < 0 


Recall that the inner optimization problem is 

1 


Vo*{x) = 


1 — a Trsn 
1 


1 — a Tren 

I — a Tren 


infE-[(C--s)+]. 

ttGII 

inf E; 


n=N+l 


- -inf E^ 

I — a Tren 

iiif 

I — a TTsn 


( ^ c{xn,an) - {s - C^)) 

n=N+l 
oo 

( ^ ^ ^{,Xni ®n) ^n) 

^f+1 

oo 

( ^ ^ Q-n) ^n) r 

n=Af+l 
oo 

( ^ ^ CL^i) 


e: 


(3.34) 

(3.35) 

(3.36) 

n=N+l 

Hence, whenever s„ < 0, we have a risk neutral optimization problem in that path oj. 
Namely, 


E' 


KT+iW - 1] 


i=n+l 


1 — a 




1 — a 


~ Sr>.. 


Without loss of generality, with some abuse of notation, we take 


"Cj (Xj, TTj) 


-S Ci{Xi^ TTj), 


(3.37) 


(3.38) 


1 — a 1 — a 

for all i G Nq. That is to say V^{xn) is the total cost from time n onwards for that 
particular path cu, where n = minjm G Nq : < 0} given the initial condition x„. The 

corresponding minimal cost is then 


V*{xn) := iniV^{xn), 

ttGII 

We also denote that for any two integers N > n >0 

K.'W = vz„(x) + 

where 

N-l 

i=n 

is the {N — n)-step cost when using the policy tt, starting at and 

OO 

yN,oo{x) ■= ^Ci{Xi,ai) 


(3.39) optim_eqn 


(3.40) 

(3.41) 


i=N 


(3.42) 
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;ical_lemmaO 


.tical lemma 


is the tail cost from time N onwards. Let 


■= inf 

TTfcli 


(3.43) 


We need the following two technical lemmas. 

Lemma 3.1. Fix an arbitrary n G No- Let K„ be as in assumptions, and let u : ]K„ M 
be a given measurable function. Define 

u*{x) := inf u{x,a), for all x G (3.44) 

aGAn{x) 

• If u is nonnegative, l.s.c. and inf-compact on K„, then there exists 7r„ G such 
that 

u*{x) = u{x, Tin), for all X ^ X (3.45) 

and u* is measurable. 

S imptions 

then u* is 

l.s.c. 


Proof. See [25]. 



□ 

Lemma 3.2. For every N > n > 0, let Wn and Wn,N bo functions on K„, 
nonnegative,l.s.c. and inf-compact on ]K„. If Wn,N t Wn as N ^ oo, then 

which are 

for all X E X. 

lim min Wn,N{x,a) = min Wn{x,a) 

N^oo a£An{x) ’ a£An{x) 


(3.46) 

Proof See [13] page 47. 
For n G No we denote 



□ 

VTAV 

N 

:= inf / c{xn,an) - s)^Q{dx'\x, foix, 

ttGII J ' ^ 

i=n 

5)) 

(3.47) 


p oo 

:= inf / {yyc{xn,an) - s)^Q{dx'\x, fo{x, 

ttGII / \ ^ 

^)) 

(3.48) 


Definition 3.1. ^4 seguence of functions Un '■ Xn ^ M zs called a solution to the optimality 
eguations if 

(3.49) 


where 


Un{x) = inf {c„(x,a) + E[un+i[Fn{x,a,fn)]]}, 

a£An{x) 


IE['Ur;,+l ^7 ^n)]] / • 

«/ Sn 


(3.50) 
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First, we introduce the following notations for simplicity. Let Ln{X) be the family of 
l.s.c. non-negative functions on X. Moreover, denote 


Pnu{x) -.= min {c„(a:, a)E[ti„+i[F„(x, a, ^n)]}, 

aeA„(x) 


(3.51) 


min_eqn 


for all X ^ X. 

l assiim-ptioTis 

Lemma 3.3. Using the Assumption lU. A then 

• Pn maps Ln+i{X) into L„(X). 

• For every u G L„+i(X), there exists a* G such that G An{x) attains the 

Imin pan 

minimum in IlS.bll} . i.e. 

^Ti) “1“ [-F 72(^7 ^715 Cn)] }5 (3.52) Ism_eq 


Proof. Let u G Ln+i{X). Then by assumptions we have that the function 

{x,a) Cn{x,a) +E[un+i[Fn{x,an,Cn)] (3.53) 

e ±t i c al _ 1 emmaO 

H there exists 7r„ G that satishes Equation 

i sin Rn 

113.521 and PnU is l.s.c. So we conclude the proof. □ 

By dynamic programming principle, we express the optimality equations 38 as 

Kra = arah)ra+l) (3.54) 

for all m > n. We continue with the following lemma. 

l assiiniDtions 

Lemma 3.4. Using the Assumption 11^. il consider a seguence {um} of functions Um G 
Lm{X) for m G Mo, then the following is true. If Un > PnUn+i for all m > n, then 
Um > Kn m> n. 

Proof. By previous lemma, there exists a policy vr = such that for all m > n 

Um{x) > Cm{x, TTm) + Um+l{xl,+ ^). (3.55) 

By iterating, we have 


N-l 


U 


m{x) P ^ ^ Oi[x^ , TTj) -f- 'Um+v(^m,+v)) 


(3.56) 


Hence we have 

Um{x) > Vm,N{x,7r), (3.57) 

for all > 0. By letting N ^ oo, we have Um(x) > 14i(x,7r) and so Um > Kn- Hence, 
we conclude the proof. □ 
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val_iter 

Theorem 3.5. Suppose that assumptions hold, then for every m>n and x G X, 

v:a^) t v:{x), (3.58) 

as N ^ oo and Vf is l.s.c. 

Proof. We justify the statement by appealing to dynamic programming algorithm, we 
have Jn{x) := 0 for all x E Xn, and by going backwards for t = N — 1, N — 2,n, and 
let 

Jt{x) := inf {ct{x,a) + Jt+i[Ft{x,a,f)]}. (3.59) eqq 

a^Atix) - 

By backward iteration, for t = — 1,..., u, there exists TVt G such that nm{x) G Am{x) 

attains the minimum in the Eauation and . ttm-o .vr^j is an ootimal oolicv. 

Moreover, Jn is the optimal cost for 

.Jn{x) := V*p^{Xn), (3.60) 

Hence, we have 

,{cn(x, a) + V*^^A^n{x, a, 0]}- (3.61) 

a€An{x) 

jcrit.ical lemma 

Denoting u{x) = ]^{x), we have u{x) is l.s.c. By Lemma Ifl^l we have 

KA= min {c„(x,a) + V;%[F„(a:,a,0]}- (3.62) 

a£An{x) 

Moreover, cost functions Cn{x,a) being nonnegative, we have u{x) < Vf{x). But by deh- 
nition, we have Vf{x) < u{x). Hence, we conclude the proof. □ 

Intuitively, the theorem means that whenever < 0 we have the risk neutral control 
problem where the policy is Markovian in the usual sense and hence whenever < 0 we 
can solve the sub-problem after time n using the classical dynamic programming principle. 

We treat the classical LQ-problem using risk sensitive AVaR operator to illustrate our 
results below and give a heuristic algorithm that specihes the decision rule at each time 
episode n based on our results above. 


3.1 A Toolbox Example 

We solve the classical linear system with a quadratic one-stage cost problem with AVaR 
Criteria. Suppose we take X = M with a linear system 


Xn-\-\ H” ^n: (3.63) 
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with xo = 0, Zn is i.i.d. standard normal i.e. Zn ~ A/'(0,1). We take one stage cost 
functions as c(x„, an) = for n = 0,1,N — 1. We also assume that the control 

constraint sets A„(x) with x E X are all equal to An = M. Thus, under the above 
assumptions , we wish to hnd a policy that minimizes the performance criterion 


J(7r,x) := AVaR^ + 


N-l 


n=0 


(3.64) 


It is well known that in risk neutral case using dynamic programming, the optimal policy 
TT* = {/o,..., /n-l} and the value function Jn satisfy the following dyanimcs 

/jv_i(x) = 0 

fn{x) = -(1 + Kn+l)-^Kn+l 

Kn = 0 


Kn = 


^-^Kn 


1 ( f T Xn-\-1) rv n+1 

N-l 


Kn +1 + 1, for n = 0,- 1 
Jn(x) = Knx'^ + Ki, for n = 0,..., — 1 


z=n+l 


(see e.g. [I3]). In risk sensitive case, we proceed as follows. We take On = 0 for n = 
0 ,..., N — 1, i.e. ttq = {0, 0, ....0} and let 


N-l 

S ;= VaRa(^ c(Xn, an)) 
n=0 

:= inf < X G M : P( 

n=0 

Then we check the global variable s. If s < 0, then we appeal to the risk neutral case 
and hnd the optimal policy accordingly. If s > 0, then we choose oq = 0, this makes the 
cost at time 0 minimal by dehnition. According to the output we go to time n = 1 and 
update our state to Xi and repeat the procedure for each time episode n = 0,..., A^ — 1. 
We give the pesudocode of this algorithm below. 


Xi<x 


> a 
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Algorithm 1 LQ Problem with AVaR algorithm 
1: procedure LQ-AVaR Algorithm 
2 : 

3: for each n E N — 1 do 

4: if s < 0 then 

5: apply Dynamic Programming from state at time n onwards 

6: else 

7: Choose an = 0 

8: Update s = s — 

9: Update Xn+i = 

10: end if 

11: end for 

12: end procedure 
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